{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "yzGy7Oi-EoNo"
   },
   "source": [
    "## Retrieval Augmented Generation with LanceDB  \n",
    "\n",
    "**Objective:**\n",
    "Use Llama 2.0, Langchain and LanceDB to create a Retrieval Augmented Generation (RAG) system.\n",
    "\n",
    "This will allow us to ask questions about our documents (that were not included in the training data), without fine-tunning the Large Language Model (LLM).\n",
    "\n",
    "Here Text Splitting will help LLM to give accurate answers without hallucination.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "ENgV8B98FP37"
   },
   "source": [
    "## What is a Retrieval Augmented Generation (RAG) system?\n",
    "\n",
    "Retrieval-augmented generation (RAG) is an AI framework for improving the quality of LLM-generated responses by grounding the model on external sources of knowledge to supplement the LLM’s internal representation of information. Implementing RAG in an LLM-based question answering system has two main benefits:\n",
    "1. It ensures that the model has access to the most current, reliable facts, and that users have access to the model’s sources, ensuring that its claims can be checked for accuracy and ultimately trusted.\n",
    "2. RAG has additional benefits. By grounding an LLM on a set of external, verifiable facts, the model has fewer opportunities to pull information baked into its parameters. This reduces the chances that an LLM will leak sensitive data, or ‘hallucinate’ incorrect or misleading information.\n",
    "\n",
    "\n",
    "The orchestration of the retriever and generator will be done using Langchain. A specialized function from Langchain allows us to create the receiver-generator in one line of code.\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "gIFJpZ_3D-wu",
    "outputId": "e3182a35-a6e3-4dcf-e2f5-c55cf6aeb904"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Requirement already satisfied: transformers in /usr/local/lib/python3.10/dist-packages (4.46.2)\n",
      "Requirement already satisfied: accelerate in /usr/local/lib/python3.10/dist-packages (1.1.1)\n",
      "Requirement already satisfied: einops in /usr/local/lib/python3.10/dist-packages (0.8.0)\n",
      "Requirement already satisfied: langchain in /usr/local/lib/python3.10/dist-packages (0.3.7)\n",
      "Collecting langchain_community\n",
      "  Downloading langchain_community-0.3.9-py3-none-any.whl.metadata (2.9 kB)\n",
      "Requirement already satisfied: xformers in /usr/local/lib/python3.10/dist-packages (0.0.28.post3)\n",
      "Requirement already satisfied: bitsandbytes in /usr/local/lib/python3.10/dist-packages (0.44.1)\n",
      "Requirement already satisfied: lancedb in /usr/local/lib/python3.10/dist-packages (0.16.0)\n",
      "Requirement already satisfied: sentence_transformers in /usr/local/lib/python3.10/dist-packages (3.2.1)\n",
      "Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from transformers) (3.16.1)\n",
      "Requirement already satisfied: huggingface-hub<1.0,>=0.23.2 in /usr/local/lib/python3.10/dist-packages (from transformers) (0.26.2)\n",
      "Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.10/dist-packages (from transformers) (1.26.4)\n",
      "Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from transformers) (24.2)\n",
      "Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from transformers) (6.0.2)\n",
      "Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.10/dist-packages (from transformers) (2024.9.11)\n",
      "Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from transformers) (2.32.3)\n",
      "Requirement already satisfied: safetensors>=0.4.1 in /usr/local/lib/python3.10/dist-packages (from transformers) (0.4.5)\n",
      "Requirement already satisfied: tokenizers<0.21,>=0.20 in /usr/local/lib/python3.10/dist-packages (from transformers) (0.20.3)\n",
      "Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.10/dist-packages (from transformers) (4.66.6)\n",
      "Requirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packages (from accelerate) (5.9.5)\n",
      "Requirement already satisfied: torch>=1.10.0 in /usr/local/lib/python3.10/dist-packages (from accelerate) (2.5.1+cu121)\n",
      "Requirement already satisfied: SQLAlchemy<3,>=1.4 in /usr/local/lib/python3.10/dist-packages (from langchain) (2.0.36)\n",
      "Requirement already satisfied: aiohttp<4.0.0,>=3.8.3 in /usr/local/lib/python3.10/dist-packages (from langchain) (3.11.2)\n",
      "Requirement already satisfied: async-timeout<5.0.0,>=4.0.0 in /usr/local/lib/python3.10/dist-packages (from langchain) (4.0.3)\n",
      "Requirement already satisfied: langchain-core<0.4.0,>=0.3.15 in /usr/local/lib/python3.10/dist-packages (from langchain) (0.3.19)\n",
      "Requirement already satisfied: langchain-text-splitters<0.4.0,>=0.3.0 in /usr/local/lib/python3.10/dist-packages (from langchain) (0.3.2)\n",
      "Requirement already satisfied: langsmith<0.2.0,>=0.1.17 in /usr/local/lib/python3.10/dist-packages (from langchain) (0.1.143)\n",
      "Requirement already satisfied: pydantic<3.0.0,>=2.7.4 in /usr/local/lib/python3.10/dist-packages (from langchain) (2.9.2)\n",
      "Requirement already satisfied: tenacity!=8.4.0,<10,>=8.1.0 in /usr/local/lib/python3.10/dist-packages (from langchain) (9.0.0)\n",
      "Collecting dataclasses-json<0.7,>=0.5.7 (from langchain_community)\n",
      "  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)\n",
      "Collecting httpx-sse<0.5.0,>=0.4.0 (from langchain_community)\n",
      "  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)\n",
      "Collecting langchain\n",
      "  Downloading langchain-0.3.9-py3-none-any.whl.metadata (7.1 kB)\n",
      "Collecting langchain-core<0.4.0,>=0.3.15 (from langchain)\n",
      "  Downloading langchain_core-0.3.21-py3-none-any.whl.metadata (6.3 kB)\n",
      "Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain_community)\n",
      "  Downloading pydantic_settings-2.6.1-py3-none-any.whl.metadata (3.5 kB)\n",
      "Requirement already satisfied: typing-extensions>=4.8.0 in /usr/local/lib/python3.10/dist-packages (from torch>=1.10.0->accelerate) (4.12.2)\n",
      "Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch>=1.10.0->accelerate) (3.4.2)\n",
      "Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch>=1.10.0->accelerate) (3.1.4)\n",
      "Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from torch>=1.10.0->accelerate) (2024.10.0)\n",
      "Requirement already satisfied: sympy==1.13.1 in /usr/local/lib/python3.10/dist-packages (from torch>=1.10.0->accelerate) (1.13.1)\n",
      "Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.10/dist-packages (from sympy==1.13.1->torch>=1.10.0->accelerate) (1.3.0)\n",
      "Requirement already satisfied: deprecation in /usr/local/lib/python3.10/dist-packages (from lancedb) (2.1.0)\n",
      "Requirement already satisfied: nest-asyncio~=1.0 in /usr/local/lib/python3.10/dist-packages (from lancedb) (1.6.0)\n",
      "Requirement already satisfied: pylance==0.19.2 in /usr/local/lib/python3.10/dist-packages (from lancedb) (0.19.2)\n",
      "Requirement already satisfied: overrides>=0.7 in /usr/local/lib/python3.10/dist-packages (from lancedb) (7.7.0)\n",
      "Requirement already satisfied: pyarrow>=12 in /usr/local/lib/python3.10/dist-packages (from pylance==0.19.2->lancedb) (17.0.0)\n",
      "Requirement already satisfied: scikit-learn in /usr/local/lib/python3.10/dist-packages (from sentence_transformers) (1.5.2)\n",
      "Requirement already satisfied: scipy in /usr/local/lib/python3.10/dist-packages (from sentence_transformers) (1.13.1)\n",
      "Requirement already satisfied: Pillow in /usr/local/lib/python3.10/dist-packages (from sentence_transformers) (11.0.0)\n",
      "Requirement already satisfied: aiohappyeyeballs>=2.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (2.4.3)\n",
      "Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (1.3.1)\n",
      "Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (24.2.0)\n",
      "Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (1.5.0)\n",
      "Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (6.1.0)\n",
      "Requirement already satisfied: propcache>=0.2.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (0.2.0)\n",
      "Requirement already satisfied: yarl<2.0,>=1.17.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (1.17.2)\n",
      "Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain_community)\n",
      "  Downloading marshmallow-3.23.1-py3-none-any.whl.metadata (7.5 kB)\n",
      "Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.5.7->langchain_community)\n",
      "  Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)\n",
      "Requirement already satisfied: jsonpatch<2.0,>=1.33 in /usr/local/lib/python3.10/dist-packages (from langchain-core<0.4.0,>=0.3.15->langchain) (1.33)\n",
      "Requirement already satisfied: httpx<1,>=0.23.0 in /usr/local/lib/python3.10/dist-packages (from langsmith<0.2.0,>=0.1.17->langchain) (0.27.2)\n",
      "Requirement already satisfied: orjson<4.0.0,>=3.9.14 in /usr/local/lib/python3.10/dist-packages (from langsmith<0.2.0,>=0.1.17->langchain) (3.10.11)\n",
      "Requirement already satisfied: requests-toolbelt<2.0.0,>=1.0.0 in /usr/local/lib/python3.10/dist-packages (from langsmith<0.2.0,>=0.1.17->langchain) (1.0.0)\n",
      "Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.10/dist-packages (from pydantic<3.0.0,>=2.7.4->langchain) (0.7.0)\n",
      "Requirement already satisfied: pydantic-core==2.23.4 in /usr/local/lib/python3.10/dist-packages (from pydantic<3.0.0,>=2.7.4->langchain) (2.23.4)\n",
      "Collecting python-dotenv>=0.21.0 (from pydantic-settings<3.0.0,>=2.4.0->langchain_community)\n",
      "  Downloading python_dotenv-1.0.1-py3-none-any.whl.metadata (23 kB)\n",
      "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->transformers) (3.4.0)\n",
      "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->transformers) (3.10)\n",
      "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->transformers) (2.2.3)\n",
      "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->transformers) (2024.8.30)\n",
      "Requirement already satisfied: greenlet!=0.4.17 in /usr/local/lib/python3.10/dist-packages (from SQLAlchemy<3,>=1.4->langchain) (3.1.1)\n",
      "Requirement already satisfied: joblib>=1.2.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn->sentence_transformers) (1.4.2)\n",
      "Requirement already satisfied: threadpoolctl>=3.1.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn->sentence_transformers) (3.5.0)\n",
      "Requirement already satisfied: anyio in /usr/local/lib/python3.10/dist-packages (from httpx<1,>=0.23.0->langsmith<0.2.0,>=0.1.17->langchain) (3.7.1)\n",
      "Requirement already satisfied: httpcore==1.* in /usr/local/lib/python3.10/dist-packages (from httpx<1,>=0.23.0->langsmith<0.2.0,>=0.1.17->langchain) (1.0.7)\n",
      "Requirement already satisfied: sniffio in /usr/local/lib/python3.10/dist-packages (from httpx<1,>=0.23.0->langsmith<0.2.0,>=0.1.17->langchain) (1.3.1)\n",
      "Requirement already satisfied: h11<0.15,>=0.13 in /usr/local/lib/python3.10/dist-packages (from httpcore==1.*->httpx<1,>=0.23.0->langsmith<0.2.0,>=0.1.17->langchain) (0.14.0)\n",
      "Requirement already satisfied: jsonpointer>=1.9 in /usr/local/lib/python3.10/dist-packages (from jsonpatch<2.0,>=1.33->langchain-core<0.4.0,>=0.3.15->langchain) (3.0.0)\n",
      "Collecting mypy-extensions>=0.3.0 (from typing-inspect<1,>=0.4.0->dataclasses-json<0.7,>=0.5.7->langchain_community)\n",
      "  Downloading mypy_extensions-1.0.0-py3-none-any.whl.metadata (1.1 kB)\n",
      "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch>=1.10.0->accelerate) (3.0.2)\n",
      "Requirement already satisfied: exceptiongroup in /usr/local/lib/python3.10/dist-packages (from anyio->httpx<1,>=0.23.0->langsmith<0.2.0,>=0.1.17->langchain) (1.2.2)\n",
      "Downloading langchain_community-0.3.9-py3-none-any.whl (2.4 MB)\n",
      "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.4/2.4 MB\u001b[0m \u001b[31m76.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hDownloading langchain-0.3.9-py3-none-any.whl (1.0 MB)\n",
      "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.0/1.0 MB\u001b[0m \u001b[31m47.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hDownloading dataclasses_json-0.6.7-py3-none-any.whl (28 kB)\n",
      "Downloading httpx_sse-0.4.0-py3-none-any.whl (7.8 kB)\n",
      "Downloading langchain_core-0.3.21-py3-none-any.whl (409 kB)\n",
      "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m409.5/409.5 kB\u001b[0m \u001b[31m32.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hDownloading pydantic_settings-2.6.1-py3-none-any.whl (28 kB)\n",
      "Downloading marshmallow-3.23.1-py3-none-any.whl (49 kB)\n",
      "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m49.5/49.5 kB\u001b[0m \u001b[31m5.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hDownloading python_dotenv-1.0.1-py3-none-any.whl (19 kB)\n",
      "Downloading typing_inspect-0.9.0-py3-none-any.whl (8.8 kB)\n",
      "Downloading mypy_extensions-1.0.0-py3-none-any.whl (4.7 kB)\n",
      "Installing collected packages: python-dotenv, mypy-extensions, marshmallow, httpx-sse, typing-inspect, pydantic-settings, dataclasses-json, langchain-core, langchain, langchain_community\n",
      "  Attempting uninstall: langchain-core\n",
      "    Found existing installation: langchain-core 0.3.19\n",
      "    Uninstalling langchain-core-0.3.19:\n",
      "      Successfully uninstalled langchain-core-0.3.19\n",
      "  Attempting uninstall: langchain\n",
      "    Found existing installation: langchain 0.3.7\n",
      "    Uninstalling langchain-0.3.7:\n",
      "      Successfully uninstalled langchain-0.3.7\n",
      "Successfully installed dataclasses-json-0.6.7 httpx-sse-0.4.0 langchain-0.3.9 langchain-core-0.3.21 langchain_community-0.3.9 marshmallow-3.23.1 mypy-extensions-1.0.0 pydantic-settings-2.6.1 python-dotenv-1.0.1 typing-inspect-0.9.0\n"
     ]
    }
   ],
   "source": [
    "# Installation\n",
    "\n",
    "!pip install transformers accelerate einops langchain langchain_community xformers bitsandbytes lancedb sentence_transformers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "id": "tfb94kwvLtU1"
   },
   "outputs": [],
   "source": [
    "# imports\n",
    "\n",
    "from torch import cuda, bfloat16\n",
    "import torch\n",
    "import transformers\n",
    "from transformers import AutoTokenizer\n",
    "from time import time\n",
    "import lancedb\n",
    "from langchain_community.llms import HuggingFacePipeline\n",
    "from langchain.document_loaders import TextLoader\n",
    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
    "from langchain_community.embeddings import HuggingFaceEmbeddings\n",
    "from langchain.chains import RetrievalQA\n",
    "from langchain_community.vectorstores import LanceDB"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "NGvhj6XWI2YV",
    "outputId": "9dbca1b6-83a6-4c4d-f72a-a97b59efc6eb"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--2024-12-04 19:20:46--  https://gist.githubusercontent.com/PrashantDixit0/10fd4ab8a7d0d37de361af2a06ecfbe2/raw/indianEconomy.txt\n",
      "Resolving gist.githubusercontent.com (gist.githubusercontent.com)... 185.199.108.133, 185.199.111.133, 185.199.109.133, ...\n",
      "Connecting to gist.githubusercontent.com (gist.githubusercontent.com)|185.199.108.133|:443... connected.\n",
      "HTTP request sent, awaiting response... 200 OK\n",
      "Length: 2620 (2.6K) [text/plain]\n",
      "Saving to: ‘indianEconomy.txt’\n",
      "\n",
      "indianEconomy.txt   100%[===================>]   2.56K  --.-KB/s    in 0s      \n",
      "\n",
      "2024-12-04 19:20:46 (46.0 MB/s) - ‘indianEconomy.txt’ saved [2620/2620]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# Dataset(in txt format)\n",
    "\n",
    "!wget https://gist.githubusercontent.com/PrashantDixit0/10fd4ab8a7d0d37de361af2a06ecfbe2/raw/indianEconomy.txt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "id": "8IoDr-jZJFVm"
   },
   "outputs": [],
   "source": [
    "# load data\n",
    "\n",
    "dataloader = TextLoader(\"indianEconomy.txt\", encoding=\"utf8\")\n",
    "documents = dataloader.load()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "sh05Xli7MJMh"
   },
   "source": [
    "## Text Chunking\n",
    "\n",
    "We have discussed various types Text Chunking for LLM Applications and Tips and Tricks related to it.\n",
    "\n",
    "Refer - https://medium.com/p/a420efc96a13/edit\n",
    "\n",
    "Here you can try out different Text Spplitting Strategies according to your data and Tips and Tricks discussed in Blog.\n",
    "\n",
    "For Now, we are going to use Recursive Text Splitting using LangChain"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "id": "r0om7g1nMEOk"
   },
   "outputs": [],
   "source": [
    "recursive_text_splitter = RecursiveCharacterTextSplitter(\n",
    "    chunk_size=10000, chunk_overlap=200\n",
    ")\n",
    "all_splits = recursive_text_splitter.split_documents(documents)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "DxqPNUpgNprN"
   },
   "source": [
    "## Embeddings Generator\n",
    "\n",
    "Creating embeddings using Sentence Transformer with HuggingFace embeddings."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 535,
     "referenced_widgets": [
      "e7c9395bf3bd4194aa4a0ba77a3b123b",
      "65df3c8f5a4748949cf615f87eca6001",
      "e11f8efe7ef84c70acf968f1c9141b8a",
      "67e46069fb0547d09b07a93d30c05b17",
      "e2cf9c1c757642e8af2fe87ea28585aa",
      "ff23e25e57f442c9a599e3f0d18d3899",
      "e5aa85cf98354d97b79132705841b2e0",
      "78fb0335ca174fcba3630c26bf770924",
      "e892dff7de8d4e2ba0d8445e22e56799",
      "4dbec5f502dd45c8be062671df75e03f",
      "779dd7a2742b45cba805e5752fe2e78f",
      "d53fe0df67cd4626af72209816f4cc15",
      "fca16292e1bb4b5d869b7f385c1b0224",
      "fbae5aa480ac488f96c4758c0432b0e3",
      "bfd61cb0471b4457a456ab8a83c0894f",
      "59e7f1ea04c045459fb84b6d106d8bfb",
      "e6441e5c51e04c8fab98c43dd28617e0",
      "0361dfaa82564cb086d88820f069b1cc",
      "b7aee27919a14ae5a2e9b285e93e09c7",
      "f66ce77ac93548628574ec2f87df814a",
      "96d784222029407b8f4e289983227634",
      "e5546a8a0afc4dc1849cf8e448b6d737",
      "bff9c8e2c3cd42da9a6aeccb539988c1",
      "6499d50f7bde465297d22daee4b50fa1",
      "ad633d5f1b9c4e298833ccf228c94748",
      "957df4823bf242bbbcecd279b997d983",
      "99074b2c108b4eeb83fa01cc95979543",
      "f7dca0abfaf74c33a14d189dbd9d208f",
      "27ee0cc1d8614455a233269a27c34782",
      "266e91f0426446208b3ea6a90292fa56",
      "7bff7531762a4925806a40fd5fab8fd6",
      "549d93e113f149c5b9c35afc67f66ef5",
      "2454b1f8b172493d8a5e9eb85a7d0bb0",
      "5d48b6f4848a41a6afd3cf4539a92be2",
      "3fb9907f6d83457db71886500fa32a67",
      "b36788ca19484c66b4abf9f27ba6bad1",
      "81d1d03f68ea4f87891bbf5ed7fbeff8",
      "bbc128c2593141e09d2239864ed14138",
      "1ce8d7b334d648d2bab760f85c1d1947",
      "96a565943f5b4376b1f7ece81efdc868",
      "220f314e890542398d9e58cc930e4a7a",
      "29134bc2eabf4a9b859636d6f7c16a56",
      "b0f9f249c79f4301a3412179a269c088",
      "48d96e9f41a84977b99db59e10cc03f9",
      "97cbfaa457d14ee8979c4a4964bc072b",
      "57de22f2e9e3413696106554fd777190",
      "4998b29f171845928d09a1c990282ebc",
      "c89101a8320f46c785f3c9bb3ded0c9a",
      "e49a8deda65a40bba9582fe40fb34e8c",
      "e346f8db63104d4997178cbb598913b1",
      "c53231c33f9a413ba5702413ed98e2f8",
      "531739157a19426d87eaacf716b612cb",
      "7b6817b2297c448eac14f85319694164",
      "7b8d1156ca1f4cc0ae6888be9e610d25",
      "8a10d0bc3a3d4588baedd84fe2eb57eb",
      "b137a99bc3c948c683b68e500fc1cd08",
      "6ae977c865f348d8b59d2e1786894002",
      "995ff125277e4fe3a3d6bb112c79a9df",
      "fa9d522f12fe48c7af56c84725d06e27",
      "12740113983e44ba9d50fe2d850d4869",
      "462379eceaf34bb59b5d785e1b9b3ea7",
      "528ecf1507994d829cb6b0e40b22d8aa",
      "df5d836e16a34b89813df8393dae818b",
      "f69eb6b0486542a5a5ceca7f5a90aaae",
      "bb5c8bf465624a2ea8104e8114eb5e60",
      "35c4cc943c4247ef8e18bec0e65f0075",
      "21113d0df0134b6abec86e9f13aa8260",
      "c6b5923dfb8440468a78dc140c099faf",
      "17558a4760414c04b14ef8ba37878e51",
      "40a4e000134145dd83893969b31e3824",
      "81df00c1454340caa5258228dc7a2008",
      "c66835103e2149a284c15566540dfb17",
      "6050bcd81c694d119adb8d7a48bd31f6",
      "06ba3152f47e4e629b1a485c1bb90db1",
      "bd70f2ddff7746f5a958f6332a935d48",
      "ce221ee1faac41a0907d9c596763cad6",
      "56fb4d35c9fe4490b078570cba2d858d",
      "4698c70fdf4c47e6811f5c999756c35c",
      "d0f423daa199477fba270f80272fb7b1",
      "c872b67db453404ea03188e0ae17b864",
      "0b63ae271b25438cbf18feac54182d80",
      "1c65ad9e384142b09416d42553b04fa8",
      "c2d78a32aa744f3f99c4a0dae90ee0ea",
      "7793eb2e196949ad92a0f8a9bc29c56b",
      "d3bb14829e6743a68a6c9c383b787bb2",
      "c7caed7992b449c8923cd411c7056944",
      "af5ba7d356794454bfee671eda6c8cff",
      "87a2927a29dc482fbb1c648a9bbf13d9",
      "bbe4d1e33cc54836bbd764deeea755a0",
      "277669a490a04eb7a0eb226f12feebf0",
      "2fda2f47c0784c339044eb747a93355c",
      "7905061441b44a45bb14a6f0eba65ae5",
      "e38433a0ff5a412184385a5994c54609",
      "163eb69848574085a70346c19479c224",
      "b28e4ef48f4647eca74ebc5e1829a55a",
      "8ee9ce0f3b2c4a0595e31d408681a73f",
      "b106894589204993a956fccc0b225a94",
      "5ec30abb0dd94fd3968c903c38b4653e",
      "c8e080cb2fee4c3baf4589b8a9504055",
      "317c0b2a32424aa3a7fd0a3ca458e25f",
      "c3a1f6cf9ea943caaed245a98b29df1b",
      "ae5410bb97bc4c1d9c0cb1ad59a5f01f",
      "37d17026dcc748839c1c30edbb4d1959",
      "0d0a90f4afb14d8b95c3953814a55a9c",
      "416fcb8a54ac4996b239881296c98e45",
      "c7fe91d311194ab99ba0c90392a8bb46",
      "bbd2b8f0a44f4c0890f7ea7650afb846",
      "641e058b841f468989a50a56ee8e9708",
      "ab8635367797423a9a2b0e0894326546",
      "2bff5b0e87584516901370aeb7cd574a",
      "05d1c062125844d4a91afb8e8a20f790",
      "eb5543f75c1d4728aface6b397e5b17a",
      "4067548ab10f4750b211ec9e7cadc502",
      "13143df62edd43c6b2034718e2bf899a",
      "d39e42d74c734047bc41ce8f728dd21b",
      "ad548eaa50704a65a8fda0250eb9e61b",
      "ac1424bd54f946e88ad181c9ce930bcb",
      "19b0da208f5644deaada3ae7115a5d22",
      "8043da06c21b428b878f53367a8bee2d",
      "e718dbda2d3e4e9aa9c0c3a114690cc0",
      "2efb703f083c4b61a23f19bb0fddd706"
     ]
    },
    "id": "UCuE42JMNmrz",
    "outputId": "1e5e0688-8cdb-4b5c-bae5-2b748a60035e"
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "<ipython-input-8-78fd272816e3>:4: LangChainDeprecationWarning: The class `HuggingFaceEmbeddings` was deprecated in LangChain 0.2.2 and will be removed in 1.0. An updated version of the class exists in the :class:`~langchain-huggingface package and should be used instead. To use it run `pip install -U :class:`~langchain-huggingface` and import as `from :class:`~langchain_huggingface import HuggingFaceEmbeddings``.\n",
      "  embeddings = HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs)\n",
      "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_auth.py:94: UserWarning: \n",
      "The secret `HF_TOKEN` does not exist in your Colab secrets.\n",
      "To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.\n",
      "You will be able to reuse this secret in all of your notebooks.\n",
      "Please note that authentication is recommended but still optional to access public models or datasets.\n",
      "  warnings.warn(\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "e7c9395bf3bd4194aa4a0ba77a3b123b",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "d53fe0df67cd4626af72209816f4cc15",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "bff9c8e2c3cd42da9a6aeccb539988c1",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "5d48b6f4848a41a6afd3cf4539a92be2",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "97cbfaa457d14ee8979c4a4964bc072b",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "b137a99bc3c948c683b68e500fc1cd08",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "21113d0df0134b6abec86e9f13aa8260",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "4698c70fdf4c47e6811f5c999756c35c",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "bbe4d1e33cc54836bbd764deeea755a0",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "317c0b2a32424aa3a7fd0a3ca458e25f",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "05d1c062125844d4a91afb8e8a20f790",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "model_name = \"sentence-transformers/all-mpnet-base-v2\"\n",
    "model_kwargs = {\"device\": \"cuda\"}\n",
    "\n",
    "embeddings = HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "_Og-UmALOYwV"
   },
   "source": [
    "## LanceDB for vector storage and searching\n",
    "\n",
    "Initialize LanceDB with the Recursive Text Chunking, the *embeddings*   Sentence Transformer object will be used for extract embeddings from all the text splits"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "id": "vqO6knYcOe1K"
   },
   "outputs": [],
   "source": [
    "db = lancedb.connect(\"/tmp/lancedb\")\n",
    "table = db.create_table(\n",
    "    \"rag_table\",\n",
    "    data=[\n",
    "        {\n",
    "            \"vector\": embeddings.embed_query(\"Indian Economoy\"),\n",
    "            \"text\": \"Current and future details of Indian Economy\",\n",
    "            \"id\": \"1\",\n",
    "        }\n",
    "    ],\n",
    "    mode=\"overwrite\",\n",
    ")\n",
    "\n",
    "vectordb = LanceDB.from_documents(documents, embeddings, connection=db)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "zBW-Yp7mPxQY"
   },
   "source": [
    "## Initialize RAG Chain\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Aa95aAbP6ukc"
   },
   "source": [
    "## Chat Models\n",
    "\n",
    "Here you can change to any other LLM for Chat Model.\n",
    "\n",
    "Refer to LangChain, There are few Chat Models which can be used as Chat model to generate answers in RAG.\n",
    "https://python.langchain.com/docs/integrations/chat/"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "-3oNrEEZy6qe",
    "outputId": "79e9eeac-0672-4f73-db56-b7d66c59ae44"
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "<ipython-input-11-2024129c7277>:5: LangChainDeprecationWarning: The class `ChatOpenAI` was deprecated in LangChain 0.0.10 and will be removed in 1.0. An updated version of the class exists in the :class:`~langchain-openai package and should be used instead. To use it run `pip install -U :class:`~langchain-openai` and import as `from :class:`~langchain_openai import ChatOpenAI``.\n",
      "  llm = ChatOpenAI()\n"
     ]
    }
   ],
   "source": [
    "from langchain.chat_models import ChatOpenAI\n",
    "import os\n",
    "\n",
    "os.environ[\"OPENAI_API_KEY\"] = \"sk-proj-...\"\n",
    "llm = ChatOpenAI()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "id": "zdhU5x1aPwma"
   },
   "outputs": [],
   "source": [
    "# Retreiver\n",
    "retriever = vectordb.as_retriever()\n",
    "\n",
    "qa = RetrievalQA.from_chain_type(\n",
    "    llm=llm, chain_type=\"stuff\", retriever=retriever, verbose=True\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "8XxQSWUHOiPG",
    "outputId": "e889c01d-f0b1-4636-bb4c-db834aeba6b6"
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "<ipython-input-13-8558b0bda784>:5: LangChainDeprecationWarning: The method `Chain.__call__` was deprecated in langchain 0.1.0 and will be removed in 1.0. Use :meth:`~invoke` instead.\n",
      "  result = qa({\"query\": query})\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\u001b[1m> Entering new RetrievalQA chain...\u001b[0m\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n",
      "{'query': 'What is growth of Indian Economy?', 'result': 'The Indian economy grew at a healthy rate of 7.8 percent in the first quarter of the ongoing financial year. Economists at the Reserve Bank of India have pegged growth at 6.8 percent in the second quarter, which is marginally higher than expectations. Overall, the economy is showing signs of sustained momentum, with various economic indicators pointing towards growth in different sectors. However, there are some concerns such as sluggish global demand affecting exports, lack of broad-based pick-up in the investment cycle, and challenges in creating high-quality jobs for the increasing labor force. Additionally, there are concerns about rising household borrowings and potential implications of a credit slowdown on the overall economic growth.'}\n"
     ]
    }
   ],
   "source": [
    "# Results of RAG\n",
    "\n",
    "query = \"What is growth of Indian Economy?\"\n",
    "\n",
    "result = qa({\"query\": query})\n",
    "\n",
    "print(result)"
   ]
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "colab": {
   "gpuType": "T4",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
